-
-
Notifications
You must be signed in to change notification settings - Fork 147
refactor(series)!: š°ļø drop TimestampSeries #1274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weāll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c81cd6e
to
d5e1089
Compare
d5e1089
to
41c7015
Compare
abf9147
to
cbbd372
Compare
@cmp0xff you have a number of PRs submitted while I was out on vacation for 2 weeks. Can you let me know which ones I should prioritize for review? |
Hi @Dr-Irv, I hope you had a nice vacation. My pull requests are categorised below. Each category is independent, but those in a higher position have a slightly higher priority in my opinion.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. It's a lot of good work.
Main thing - if I'm going to merge this PR, it needs to be in a state where we don't need the followup PR.
Basic rule - we don't put ignore
in the tests unless we are testing that the stubs should not accept something that is invalid. You have places where you have added ignore
in the tests and I won't merge that in (unless we know it is a bug in the type checker)
Thank you very much for your quick and thorough reviews. I will be able to work on them next week. |
cbbd372
to
ed69ec5
Compare
b095af2
to
f1cf19f
Compare
If I understand correctly, we can do the experiment below: (it's an from typing import Any, overload, Generic, reveal_type
from typing_extensions import TypeVar, Never, Self
class PUnknown: ...
T = TypeVar("T", bound=int | PUnknown, default=PUnknown)
class Se(Generic[T]):
@overload
def __sub__(self: Se[int], other: Se[int]) -> Never: ...
@overload
def __sub__(self, other: Self) -> Self: ...
def foo(a: Se[int]) -> Se[int]: ...
reveal_type(foo(Se[PUnknown]())) # mypy, pyright: cannot assign
reveal_type(Se[PUnknown]() - Se[PUnknown]()) # mypy, pyright: Se[PUnknown] I think it will be a big change, worthy a separate PR, if we do it. In particular, |
I played with the idea. Seems like a lot of work. The subtype argument is what makes it a problem. So let's not go down that path.
I actually don't think that pyright is wrong by saying I wish the typing spec allowed you to have
I see your point. So where does that leave us? What are our options? This might be the argument to keep |
The following is my understanding:
|
The choice is between (2) and (3). I'm not sure with (2) how to handle the With respect to (3), I think the issue here is Having said that, you're saying "we need to change our philosophy", so can you be more clear on how you would document that? Note - I see you requested another review. I'm traveling, so can't do that for a few days, but we need to resolve this discussion first anyway. |
Hi @Dr-Irv ,
I think my previous example can do it. Let me make it even shorter (note it is a from __future__ import annotations
from typing import (
Any,
overload,
Generic,
Never,
reveal_type,
Self,
)
from typing_extensions import TypeVar
T = TypeVar("T", int, str)
class Se(Generic[T]):
@overload
def __sub__(self: Se[int], other: Se[int]) -> Never: ...
@overload
def __sub__(self, other: Self) -> Self: ...
def t1() -> None:
reveal_type(Se[Any]() - Se[Any]()) # mypy: Any, pyright: Never.
from __future__ import annotations
from typing import (
Any,
Generic,
overload,
reveal_type,
)
from typing_extensions import TypeVar
T = TypeVar("T", int, str)
class Se(Generic[T]):
@overload
def __sub__(self: Se[int], other: Se[int]) -> Se[int]: ...
@overload
def __sub__(self, other: Se) -> Se: ...
def foo(a: Se[int]) -> Se[int]: ...
def t1() -> None:
reveal_type(Se[Any]() - Se[str]()) # mypy: Se[Any], pyright: Se[Unknown]. The logic of Probably, asking the
In terms of Python script, we currently have the following in frame = pd.DataFrame({"timestamp": [pd.Timestamp(2025, 8, 26)], "tag": ["one"], "value": [1.0]})
timestamps = frame["timestamp"]
reveal_type(timestamps) # type checker: Series[Any], runtime: Series
reveal_type(timestamps - pd.Timestamp(2025, 7, 12)) # type checker: Unknown and error, runtime: Series
reveal_type(cast("TimestampSeries", timestamps) - pd.Timestamp(2025, 7, 12)) # type checker: TimedeltaSeries, runtime: Series
tags = frame["tag"]
reveal_type("suffix" + tags) # type checker: Never, runtime: Series After changing our philosophy, we will have something like frame = pd.DataFrame({"timestamp": [pd.Timestamp(2025, 8, 26)], "tag": ["one"], "value": [1.0]})
timestamps = frame["timestamp"]
reveal_type(timestamps) # type checker: Series[Any], runtime: Series
reveal_type(timestamps - pd.Timestamp(2025, 7, 12)) # type checker: Series[Any] or Series[Unknown], runtime: Series
reveal_type(cast("Series[Timestamp]", timestamps) - pd.Timestamp(2025, 7, 12)) # type checker: TimedeltaSeries, runtime: Series
tags = frame["tag"]
reveal_type("suffix" + tags) # type checker: Series[Any] or Series[Unknown], runtime: Series |
@cmp0xff I created a pyright issue: microsoft/pyright#10924 Thanks for your analysis on how we'd be changing our philosophy if we chose option (3). The way I see it is that option (2) is providing more error checking for users than option (3). However, with option (2), you are also forced to say that I'm leaning towards (3), but let's see what the |
So it appears that it's a known bug in |
Casting approachOne of our aims is to help the user recognising potential mistakes. When it comes to
Users are asked to This is no more viable if we want to drop Passive approachWe could follow the examples from native Python types. Consider the following from datetime import datetime
from typing import Any, reveal_type
a: Any
b: datetime
reveal_type(a + "test") # Any
reveal_type(a - b) # Any
reveal_type(a * b) # Any The typing is quite passive, especially in the third case, where However this approach ensures maximum extensibility for subclasses. In this approach, the examples from the casting approach become
Progressive approachThis is developed from my "consistent plan" in #1343 (comment). In contrast to the casting approach, we do
The argument is, if the arithmetic works at run time, the resulting type is the only valid type. Users can be helped by the static type checking, when they see the progressively given resulting type is non-sense. If the valid resulting type is not unique, we give |
I'm fine with this approach. Let me know when I should review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Dr-Irv , I hope I have addressed all discussions around the stub files.
philosophy.md
remains to be updated later in this PR.
Let me know when you have everything updated (including |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small things and 2 larger things:
- There are some tests that you removed that should work at runtime - inconsistencies in pandas interpretation that you can subtract a single datetime, but not a list of datetimes, so I'd like you to create an issue in pandas about any of those.
- There are changes here that are using
Series[Timedelta]
, but I'd like to have this PR only deal withSeries[Timestamp]
but still useTimedeltaSeries
whenever we haveSeries[Timedelta]
. Then the other PR can fix that.
_0 = left_ts - s # type: ignore[operator] # pyright: ignore[reportOperatorIssue] | ||
_1 = left_ts - a # type: ignore[operator] # pyright: ignore[reportOperatorIssue] | ||
_2 = left_td - s # type: ignore[operator] # pyright: ignore[reportOperatorIssue] | ||
_3 = left_td - a # type: ignore[operator] # pyright: ignore[reportOperatorIssue] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little puzzled here. We can see that Series[Any] - list[datetime]
is invalid, but why can't we see that Series[Any] - datetime
is invalid?
Or maybe Series[Any] - list[datetime]
should be allowed??
Same comment for the reverse operation and the .sub()
I think the issue here is that pandas is inconsistent, so can you report that in pandas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At run time,
Series[Timestamp] - list[datetime]
is not implemented. ENH: arithmetic between DatetimeArray and listĀ pandas#62353Series[Timestamp] - datetime
is valid.from datetime import datetime from typing import assert_type import pandas as pd arr = pd.to_datetime(["2020-01-01", "2020-01-02"]).array assert isinstance(arr, pd.arrays.DatetimeArray) print(arr - datetime(2020, 1, 1))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put back the tests that were there ? If we don't catch them in type checking, add a comment referring to the pandas issue.
|
||
if TYPE_CHECKING_INVALID_USAGE: | ||
_0 = left + s # type: ignore[operator] # pyright: ignore[reportOperatorIssue] | ||
_a = left + d # type: ignore[operator] # pyright: ignore[reportOperatorIssue] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Series[Timestamp] + Sequence[timedelta]
should be valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pandas does not support adding a list to a DatetimeArray
: pd.Series([pd.Timestamp("2025-01-01")]) + [pd.Timedelta(1, "s")]
gives TypeError: unsupported operand type(s) for +: 'DatetimeArray' and 'list'
pandas-dev/pandas#62353
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - can you put a comment in here that says this should work, so we are currently detecting it as a typing error, and refer to the pandas issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only thing that's now needed is some comments related to the things that are not working in pandas but we'd like to test. That means putting some tests back that you deleted with appropriate comments.
Ideally, if we believe it should work, but pandas says it doesn't, then we should have the type checker catch it until pandas fixes things (if ever). So I'd rather keep the tests that you deleted in tests/series/arithmetic/test_sub.py
and add any appropriate comments that point to the pandas issues.
left.sub(s) | ||
left.sub(d) | ||
|
||
left.rsub(s) # type: ignore[call-overload] # pyright: ignore[reportArgumentType,reportCallIssue] | ||
left.rsub(d) # type: ignore[call-overload] # pyright: ignore[reportArgumentType,reportCallIssue] | ||
left.rsub(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that these work is part of the inconsistency. Can you add a comment that points to the pandas issue 62353 ?
_0 = left_ts - s # type: ignore[operator] # pyright: ignore[reportOperatorIssue] | ||
_1 = left_ts - a # type: ignore[operator] # pyright: ignore[reportOperatorIssue] | ||
_2 = left_td - s # type: ignore[operator] # pyright: ignore[reportOperatorIssue] | ||
_3 = left_td - a # type: ignore[operator] # pyright: ignore[reportOperatorIssue] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put back the tests that were there ? If we don't catch them in type checking, add a comment referring to the pandas issue.
|
||
if TYPE_CHECKING_INVALID_USAGE: | ||
_0 = left + s # type: ignore[operator] # pyright: ignore[reportOperatorIssue] | ||
_a = left + d # type: ignore[operator] # pyright: ignore[reportOperatorIssue] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - can you put a comment in here that says this should work, so we are currently detecting it as a typing error, and refer to the pandas issue?
_0 = left - s # type: ignore[operator] # pyright: ignore[reportOperatorIssue] | ||
_a = left - d # type: ignore[operator] # pyright: ignore[reportOperatorIssue] | ||
|
||
_1 = s - left # type: ignore[operator] # pyright: ignore[reportOperatorIssue] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment that describes this?
TimestampSeries
,TimedeltaSeries
, etc. can be removedĀ #718assert_type()
to assert the type of any return value